Data discovery and description service

ABSTRACT

The subject disclosure relates to one or more computer-implemented processes for collecting, analyzing, and employing annotations of data sources. In particular, an annotation component is configured to receive annotations of data for a data source, wherein the respective annotations comprise different associations of a global terms with the data of the data source, a data store configured to store the annotations, and an interface component configured to render the data based on the annotations in response to a request for the data. In an aspect, storing information, the data also stores descriptions of the data sources and definitions of the global terms, and the interface component determines a subset of the information in the data store based on the annotations. A method is further provided comprising receiving a global term and determining data sources that have the global term associated with the data thereof based on the information in the data store.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of co-pending U.S. patentapplication Ser. No. 13/310,517, entitled “Data Discovery andDescription Service” and filed on Dec. 2, 2011, the entire disclosure ofwhich is hereby incorporated by reference. This application is relatedto U.S. patent application Ser. No. 13/329,165 (Docket No. 333533.01)entitled “Gesture Inferred Vocabulary Bindings” and filed on Dec. 16,2011, the entire disclosure of which is hereby incorporated byreference.

TECHNICAL FIELD

The subject disclosure relates generally to facilitating discovery ofdata and community data enrichment by capturing and storing associationsof data with vocabularies.

BACKGROUND

There is a vast amount of data available today and data is now beingcollected and stored at a rate never seen before. Further, through theemployment of various systems such as the Open Data Protocol (Odata),data is being freed from specific applications and formats. As a result,data is becoming freely accessible and integrated into new uses.

However, although data may be accessible, a new user of the data may notknow it exists. Although the user may employ a variety of search enginesin attempt to find the data, if they do not employ the proper searchterms or combination of terms, they may never come across it. Even ifuser finds a data set, they may not know what the data set is, let alonehow to use it. For example, data created for a specific application mayby structured and described in a unique way for that application. As aresult, a new user of the data may have to spend time and resourcesparsing the data in order to determine what it is and how to use it.Further, after examining the data, the user may learn that the data isnot what he/she wanted or that the data is not appropriate for his/herintended application. In addition, because data may be structured anddescribed in different ways depending on the source of the data,searching for data can result in under inclusive or over inclusiveresults.

In addition, as more and more data is shared it can be assumed thatmultiple users will employ the same data sources. For example, multipleusers will likely employ same data sources for similar applications indifferent manners, multiple users with employ different combinations ofdata sources, or multiple users will employ same data sources fordifferent applications. However, there fails to exist a method tocapture, learn from, and share, the various user applications andinterpretations of data. In other words, any user interaction with datacannot be shared for interpretation and application in another context.In existing systems, the user must change the data and share the actualchanged data. Therefore, any modification of usage of data orenhancement of the data is captured for one context of use with aspecific application by the actual application.

The above-described deficiencies of today's techniques are merelyintended to provide an overview of some of the problems of conventionalsystems, and are not intended to be exhaustive. Other problems withconventional systems and corresponding benefits of the variousnon-limiting embodiments described herein may become further apparentupon review of the following description.

SUMMARY

A simplified summary is provided herein to help enable a basic orgeneral understanding of various aspects of exemplary, non-limitingembodiments that follow in the more detailed description and theaccompanying drawings. This summary is not intended, however, as anextensive or exhaustive overview. Instead, the sole purpose of thissummary is to present some concepts related to some exemplarynon-limiting embodiments in a simplified form as a prelude to the moredetailed description of the various embodiments that follow.

In one or more embodiment, the disclosed subject matter can relate to anarchitecture that can facilitate discovery of data and community dataenrichment by capturing and storing associations of data withvocabularies. In accordance therewith, provided is acomputer-implemented system, comprising an annotation componentconfigured to receive a first annotation of data for a data source,wherein the first annotation comprises an association of a first globalterm with the data for the data source, a data store configured to storethe first annotation, and an interface component configured to renderthe data based on the first annotation in response to a request for thedata. In an aspect, the annotation component is further configured toreceive a second annotation of the data for the data source, wherein thesecond annotation comprises at least one of a different association ofthe first global term with the data for the data source or anassociation of a second global term with the data for the data source,the data store is further configured to store the second annotation andthe interface component is further configured to render the data basedon the first and second annotations in response to another request forthe data.

In another embodiment a method is provided comprising receivingannotations, wherein the annotations include associations of globalterms with data of data sources, storing information, the informationincluding, descriptions of the data sources, definitions of the globalterms, and the annotations, and determining a subset of the informationbased on the annotations. In an aspect, the determining the subset ofinformation includes receiving a global term and determining datasources that have the global term associated with the data thereof.

Still in yet another embodiment, disclosed is a computer readablestorage medium comprising computer executable instructions that, inresponse to execution, cause a computing system to perform operations,comprising receiving annotations, wherein the annotations includeassociations of global terms with data of data sources, storinginformation, the information including, descriptions of the datasources, definitions of the global terms, and the annotations,generating subsets of the information in response to requests based onthe annotations. The operations can include receiving selections of atleast one of a data source, an annotation, or a global term included inthe subsets, and tracking the selections. In an aspect, the operationsfurther comprising determining a relationship between respectiverequests and respective selections, and determining the subsets of theinformation based on the relationship. These and other embodiments aredescribed in more detail below.

BRIEF DESCRIPTION OF THE DRAWINGS

The system and methods for representing synchronization knowledge and/orpartial knowledge for multiple nodes sharing subsets of a set ofinformation are further described with reference to the accompanyingdrawings in which:

FIG. 1 illustrates a block diagram of an exemplary non-limiting systemthat can facilitate sharing of vocabularies to improve the consumptionof data that has been associated with a vocabulary or global term;

FIG. 2 illustrates a block diagram of another exemplary non-limitingsystem that can facilitate sharing of vocabularies to improve theconsumption of data that has been associated with a vocabulary or globalterm;

FIG. 3 illustrates a flow diagram of an example implementation of a datadescription and delivery service in accordance with an embodiment;

FIG. 4 illustrates a flow diagram of another example implementation of adata description and delivery service in accordance with an embodiment;

FIG. 5 illustrates a process for storing an annotation of a data sourceand rendering the annotation upon a request for data from the datasource;

FIG. 6 illustrates a process of community enrichment of data;

FIG. 7 illustrates a process for issuing queries against a datadescription and delivery service in accordance with various embodiments;

FIG. 8 illustrates a process of tracking use of a data description anddelivery service for business intelligence analysis;

FIG. 9 is a block diagram representing an exemplary non-limitingnetworked environment in which the various embodiments may beimplemented; and

FIG. 10 is a block diagram representing an exemplary non-limitingcomputing system or operating environment in which the variousembodiments may be implemented.

DETAILED DESCRIPTION Introduction to Vocabularies

Certain subject matter disclosed herein is directed to the use ofvocabularies to facilitate discovery of data and community dataenrichment. As used herein, the term data is employed to describemachine-readable data. In a traditional sense, a person's vocabulary isthe set of words within a language that are familiar to that person.Each of the words for a particular language have an agreed upon meaningby those individuals whom adopt the language. The words of the languageare used merely as the vehicle to express the agreed upon meaning behindthem. Therefore, the more words a person acquires in her vocabulary, thebetter she can clearly express a concept to another individual whounderstands the meaning of a word employed. A person's vocabularyusually develops with age, and serves as a useful and fundamental toolfor communication and acquiring knowledge.

The concept of vocabularies can be used as a tool for enablingcommunication and enrichment of machine-readable data. Our world isawash in data. Vast amounts exist today, and more is created every year.In order to capitalize on the value of data, various methods have beenestablished that allow client applications and associated individuals tofreely access data. For example, the Open Data Protocol, commonly calledOData enables access to diverse data in a common way. OData is a networkprotocol for querying and updating data that provides a way to unlockdata and free it from silos that may exist in applications. OData doesthis by applying and building upon existing world wide web (Web)technologies such as hypertext transfer protocol (HTTP), Atom PublishingProtocol (AtomPub) and JavaScript Object Notation (JSON) to provideaccess to information from a variety of applications, services, andstores. OData can be used to expose and access data from a variety ofsources including but not limited to: relational databases, filesystems, content management systems and traditional Web sites.

Vocabularies serve as a mechanism to allow producers of data to sharemore information in a way that can be intelligently understood on theconsumption side, resulting in a higher fidelity experience for theconsumer. In particular, vocabularies associate meaning with data suchthat when a client application recognizes a vocabulary associated withdata, the client application can automatically understand how to readthe data. For example, the consumer application Sesame Data Browser(Sesame) has been configured to render the results of OData queries on amap. Sesame does this by looking for specifically named properties indata embodied in a query result which it guesses represents an entity'slocation. However, because Sesame must guess which properties representsan entities location, the accuracy of the output is suboptimal. In orderto solve the guessing problem, a vocabulary can be employed by theproducer of the data to tell the consumer, (e.g., Sesame) which propertyis the entity's location.

Vocabularies are made up of a set of related global terms, which whenused, can express some idea or concept. For example, different words canbe employed to relay some idea, concept, or meaning associated withdata. As used herein, these words are referred to as global terms. In anaspect, global terms can indicate attributes of data. For example, aglobal term can indicate whether something can be used as a title or asummary. In another aspect, global terms can indicate structure. Forexample, a global name can indicate the structure of a person's name andcontact information. In an aspect, such a vocabulary can be a coined asa “person vocabulary” and have global terms for “first name,” “lastname,” “surname,” and so forth. The global terms can potentiallydescribe a structure that can be mapped to some substructure of the datasource for an Odata feed. The data source might include something calledfull name. In an aspect, the “person vocabulary” might parse the fullname from space to space to map one field or property in the data to themultiple different terms of “first name,” “last name,” and “surname.”

With regards to the Sesame browser “guessing” problem above, a“location” vocabulary could be employed to determine which property isan entity's location. For example, an OData query result may includeactual latitude and longitude. In order for the Sesame browser toidentify what properties of the query result are the actual latitude andlongitude, a location vocabulary can be established and associated withthe query results. Accordingly, regardless of what the actual latitudeand longitude field is called in the results, as long as it isassociated with global terms for latitude and longitude, the actuallatitude and longitude can be identified by a client application whounderstands “location” vocabulary.

A conceptual schema definition language (CSDL) schema supportsannotations, which can be used as an example to refer to a vocabularyand its global terms. In an aspect, if you ask for an OData servicesmetadata (˜/service/$metadata) you get back an entity frameworkconceptual model (EDMX) document that contains a conceptual schemadefinition language (CSDL) schema. For example, an EDMX document couldbe presented as the following:

<EntityType Name=“Person” display:Title=“Firstname Lastname”>  <Key>  <PropertyRef Name=“ID” />  </Key>  <Property Name=“ID”Type=“Edm.Int32” Nullable=“false” />  <Property Name=“Firstname”Type=“Edm.String” Nullable=“true” />  <Property Name=“Lastname”Type=“Edm.String” Nullable=“true” />  <Property Name=“Email”Type=“Edm.String” Nullable=“true”>   <validation:Constraint>   <validation:Regex>{circumflex over( )}[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A- Z]{2,6}$.</validation:Regex>   <validation:ErrorMessage>Please enter a validEmailAddress</validation:ErrorMessage>   </validation:Constraint> </Property>  <Property Name=“Age” Type=“Edm.Int32” /> </EntityType>

When looking at the above EDMX document, the words EntityType canindicate the vocabulary. Global terms employed as part of the vocabularycan include Constraint and Title for example. Thus for the an EntityTypevocabulary, the EntityType definition includes (but is not limited too)both a structural annotation (validation:Constraint) and a simpleattribute annotation (display:Title).

The set of global terms for a vocabulary can be related in a variety ofways. For example, the set of related global terms can be related by theconsumer application that generally applies the set of global terms,such as Microsoft™ Excel or Sesame. In another aspect, vocabularies canconsist of global terms which serve a common purpose such as validation.In another aspect, a vocabulary can be employed to relate capabilitiessuch as this column is writable, readable, nullable or updatable. In yetanother aspect, vocabularies can consist of global terms that aregrouped together based types. For example, a vocabulary based on typecan include but is not limited to an of the following: a creative workvocabulary, and event vocabulary, an intangible vocabulary, anorganization vocabulary, a person vocabulary, a place vocabulary, aproduct vocabulary, a display vocabulary, a relationships/social graphvocabulary, a catalogue vocabulary, etc.

It should be appreciated that a word or term used to identify aparticular vocabulary can be entirely arbitrary. Employing names thatalready have meaning associated with them per the spoken Englishlanguage, or other language for that matter, merely capitalizes on apre-established association between words and the meaning many peoplealready attribute to those words. For example, a vocabulary can beidentified by a global term that includes numbers, symbols or made upwords. The important thing is that the meaning behind the vocabularyand/or its' global terms is accepted by those who employ the vocabularyand/or global term.

In an aspect, each of the vocabularies described above based on typeinclude at least one global term defining the type of vocabulary itself.For example, each of the terms creative work, event, intangible,organization, person, place, product, and display, relationships, can beconsidered a global term. In addition, as noted above, a vocabularyincludes a set of related global terms. The related global terms of aparticular vocabulary can include any conceivable, noun, verb, oradjective employed as a term to describe something that has been or canbe related to the particular vocabulary type. For example, an eventvocabulary can include global terms such as but not limited to; businessevent, children's event, comedy event, festival, food event, sports,event, schedule, calendar, speakers, etc.

In addition, each global term can also include a definition. In anaspect, the definition of a global term can include a literaldefinition. For example, the definition of the global term speaker couldbe “a person who serves as a presenter,” or the definition of the globalterm display could be “a manner for presenting an object.” In anotherexample, terms in vocabularies can be formally defined. For example, thesemantic web provides a common framework that allows data to be sharedand reused across application, enterprise, and community boundaries.Vocabularies as described herein can expand upon the formal definitionscreated for terms employed by the semantic web. In addition, thedefinition of a global term can further include additional global terms,such as child components associated with the global term. In thisrespect for example, the global term speaker could include within thedefinition, a global term for fullname, first name, last name, speech,speech time, etc.

In another aspect, the definition of a global term can includeassociations to other global terms which could impart the same meaningas the global term itself such as a synonym of the global term. Forexample, with respect to the global term “speaker,” a synonym globalterm could be “presenter” or “orator.” In another example, For example,if global term is “address,” its definition might include childcomponents for street, give synonyms for address such as location orsite, or give other global names that are not exactly address but arehighly similar.

Regardless of the global terms employed and the definitions provided, asa whole, vocabularies simply establish a system for associating dataunder common names. As discussed infra, the term annotate as used hereinrefers to the assignment or association of a global term to data. Oncedata is associated with a global term, if the consumer recognizes theglobal term, the consumer will know how to read the data. Global termsare valuable because of their ability to be re-used. Given data,regardless as to whether the data does or does not have metadataassociated there, association of a global term with that data can resultin a blob of reusable metadata which consumer can bind to without havingprior knowledge of the data itself. For example, several data sourcescould have the same global term applied to it. When a consumer employsthose data sources, regardless whether the consumer knows any details ofthe underlying data, the data consumer can read and display the data inaccordance with the global term associated therewith.

Vocabularies group global terms together so that application of aparticular vocabulary results a convenient offering of the appropriatetools needed to express the idea or concept embodied in the vocabulary.These tools are the global terms themselves. Vocabularies allowproducers of data to teach consumers of the data, richer ways tointerpret and handle data. In this respect, vocabularies can range incomplexity from simple to complex. A simple vocabulary might consist ofa few or even a single global term and tell a consumer which property touse as an entity's title when displaying it in a form. On the otherhand, a more complex vocabulary might describe a visit card (vCard) andits components or define min and max properties associated with a rangeterm. In turn, application of the range term to an EntityType forexample, could apply a value of a minimum of 1 a value of a maximum of100.

Once data is associated with a global term, discovery of the data isfacilitated through general knowledge of vocabularies. In addition, anyuse or improvements of data can essentially be captured throughassociation of data with global terms. This output is captured andcapitalized by Data description and delivery service as described in theremainder of the disclosure.

Overview of Data Description and Delivery Service

By way of an introduction, the subject matter disclosed herein relatesto various embodiments for facilitating discovery of data and communitydata enrichment by capturing and storing associations of data withvocabularies. As discussed supra, these associations are referred to asannotations. Annotations include the assignment of one or more globalterms to data of data sources, which are accessible by a network. Whenassociated with data, global terms convey some idea or concept regardingthe underlying data. This idea or concept can also model the data. As aresult, at consumption, upon recognition of the one or more global termsassigned to data, the client can use the global terms it recognizes toinfluence how the client presents or otherwise processes the dataassociated with the one or more global terms.

Although data can receive an annotation, the annotation does notpermanently fix to the data. In other words, annotations can beseparated from data for a data source. As a result, data of a datasource can be associated with multiple annotations. Further, anyone canprovide annotations for data sources. As result, the body of annotationsfor data sources is a community effort. Data description and deliveryservice is configured to receive annotations and apply those annotationswhen interfacing client to data sources that have been annotated,regardless of where the annotations came from. Therefore, any client canreceive richly described data from data sources that have beenpreviously annotated. Accordingly, the entire community benefits fromthe collective annotations. In addition, a client can employ multipleperspectives on data for a single data source depending on the varioustypes of annotations for the data source.

Data description and delivery service 102 links one or more data stores,which store these annotations, thus providing consolidated views ofdata. Therefore, although the actual data may be stored at a variety oflocations, data description and delivery service 102 can provideinformation directing a client to data sources as well as provide anyexisting annotation on the data. In addition, the data stores can storea list of data sources or descriptions of data sources, and definitionsof global terms. Data description and delivery service further providesan application program interface (API) in the form of an interfacingcomponent to enable discovery of data based on annotations and provideclients access to annotated data from various data sources. Accordingly,interface component is configured to facilitate an enriched experiencewith and the consumption of data that has been annotated.

In order to facilitate discovery, via the interfacing component, datadescription and delivery service is configured to issue query resultsagainst the data store. For example, data description and deliveryservice is configured to the determine data sources that have aparticular global term assigned to data originating therefrom. Inanother example, data description and delivery service is configured toissue query results indicating the global terms applied to data of aparticular data source. In yet another example, data description anddelivery service is configured to find data sources based on dataannotated with related global terms.

Moreover, aspects of the disclosed subject matter facilitate tracking ofuser interaction with data description and delivery service to collectinformation that can be analyzed and employed in business intelligenceschemes. In particular, annotations embody a user's interpretation anduse of data. Multiple annotations can be analyzed to determine popularuses and interpretation of data sources. In addition, by tracking theconsumption of data based on annotations some of the inferences thatdata description and delivery service can make include: popularperspectives on data, quality of those perspectives, associationsbetween data sources, association between global terms, an popularglobal terms.

Data Description and Delivery Service

Referring now to the drawings, with reference initially to FIG. 1,depicted is a system 100 configured to facilitate sharing ofvocabularies to improve the consumption of data that has been associatedwith a vocabulary or global term. Aspects of the systems, apparatuses orprocesses explained herein can constitute machine-executable componentembodied within machine(s), e.g., embodied in one or more computerreadable mediums (or media) associated with one or more machines. Suchcomponent, when executed by the one or more machines, e.g., computer(s),computing device(s), virtual machine(s), etc. can cause the machine(s)to perform the operations described

In an embodiment, data description and delivery service 102 employsshared vocabularies to facilitate an enriched experience with data whenconsumed by clients 108. In an aspect, data description and deliveryservice 102 employs shared vocabularies to facilitate an enrichedexperience with data when consumed by client who have no prior knowledgeof the data. An enriched experience with data when consumed refers theconsumption of data that has been pre-coded with one or more globalterms. As noted above, the global terms convey a preconceived meaning.When associated with data, global terms convey some idea or conceptregarding the underlying data. As used herein, the term annotate refersto the assignment or association of a global term to data. As a result,at consumption, upon recognition of the one or more global termsassigned to data, the client can employ the idea or concept embedded inthe one or more global terms. For example, the one or more global termscould define an identity of the underlying data, a manner for use of thedata, a manner in which to display the data, a manner in which tointerpret the data, or a manner in which to organize the data. In thisrespect, where the client has knowledge of the meaning of a global term,the client will automatically know how to read or consume annotateddata.

In addition, data description and delivery service 102 employs sharedvocabularies to facilitate discovery of data and facilitate annotatingdata. In order to facilitate discovery of data, data description anddelivery service 102 is configured to issue query results based storedannotations of data for data sources. For example, data description anddelivery service 102 can determine data sources which have a particularglobal term assigned to data originating therefrom. In another example,data description and delivery service 102 is configured to issue queryresults indicating the global terms applied to data of a particular datasource. Still in yet another example, data description and deliveryservice 102 can employ definitions of global terms to generate a querywith related global terms. In an aspect discussed infra, datadescription and delivery service 102 can also employ definitions ofglobal terms to facilitate annotating data.

Some of the above noted features and advantages of the subject datadescription and delivery service 102 are embodied in the followingexample. Suppose a user opens Excel, a client application, and desiresto search for data that she wants to use in a spreadsheet document. Inorder to find data, the user launches a data catalogue 110 that enablesthe user to look through various data sources. In an aspect, the datacatalogue 110 is integrated within the data description and deliveryservice 102, as depicted in FIG. 1. In another aspect, the datacatalogue 110 is external to the data description and delivery service102 and made accessed through a client application such as Sesamebrowser. Still in yet another aspect, the data catalogue 110 is merely alist of data sources displayed in a browser following a query to thedata description and delivery service 102 via interface component 108.

When the catalogue 110 opens, the user can look through data sourcesavailable. For example, these data source can include Odata feeds. In anaspect, the user can employ search terms to find data sources. Inparticular, as described supra, the user can request data sources thatapply a particular vocabulary or a particular global term. For example,the user may request and receive Odata feeds for human resource data.The human resource data can be identified by the data description anddelivery service 102 based on a vocabulary or global term that has beenlinked to the human resources data and which represents human resourcedata.

However, rather than just receiving one or more uniform resource locator(URL) links to the human resources data, the data description anddelivery service 102 can provide a description of the data. For example,the description could indicate that “this is human resources datasource,” or “this is human resources data comprising a list of name andcontact info for persons,” or “this is human resources data for CompanyX.” Further, if the user chooses to look at a particular URL link for adata source, the user could receive detailed information such as titlesfor elements of the human resources data. Then the user could select theURL link and receive sample data for use in Excel. For example, assumingthe human resources data includes people data, the sample data couldinclude rich information enabling display of first names and last namesand secondarily titles in an Excel spreadsheet. Further, the sample datacould provide various visualizations of the data including multiplepivots of the data. For example, based on the annotations of the data,there could be various views of data such as a Vcard, a chart, or agraph. The user can then look at the details and different views todetermine whether the data source includes data the user wants toemploy. She could the select the data source and in particular, datafrom the data source annotated in a particular way.

As embodied in the above example, data description and delivery service102 provides access to sufficient metadata so to enrich thevisualization experience around otherwise unknown data. For example, inthe above example, it can be assumed that Excel was not hardwired to ahuman resources data catalogue and that Excel did not know about thetypes of entities defined by the human resources data. It can also beassumed that Excel also did not know about the semantics of what a titleis or what a person is. Nevertheless, by employing the data descriptionand delivery service 102 to discover data sources, the user can receivesnot only a desired type of data but also “skin” and a “body” and a“face” the data which enhance the client's experience with the data.

In an embodiment, in order to facilitate the above noted enrichedexperiences with data and the discovery of data that offers thoseenriched experiences when consumed, data description and deliveryservice 102 employs data store 104, annotation component 106, andinterface component 108. In an aspect, interface component 108 canfurther include data catalogue component 110. The data catalogue 110 maybe integrated within the data description and delivery service 102, asdepicted in FIG. 1. In another aspect, the data catalogue 110 isexternal to the data description and delivery service 102 and madeaccessed through a client application such as Sesame browser via theinterface component 108. Still in yet another aspect, the data catalogue110 is merely a list of data sources displayed in a browser following aquery to the data description and delivery service 102 via interfacecomponent 108.

Data store 104 is configured to store information regarding locationsand structure of data, (i.e. data sources 112), and any additionalinformation describing the data by way of vocabularies that have beenapplied to the data, such as lineage or authorship. Further, data store104 is configured to store rich information about global terms andvocabularies through definitions of the global terms and associatedvocabularies. In an aspect, the information stored by data store 104 iscentrally available through data description and delivery service. Forexample, data store 104 can be provided on a server computer that can beaccessed via a network. The network can be public or private. Accordingto this example, data store 104 can be provided on a server computerthat integrates data description and delivery service 102. In anotheraspect, data store 104 can include a federation of multiple data storesinternal and/or external to data description and delivery service 102.According to this aspect, a plurality of data stores 104 can serve as“central” locations of data, such a store for a primary data descriptionand delivery service, and a “central” store for a company. However, eachof the data stores can be accessed via data description and deliveryservice 202 to enable consolidation of views on data via theirannotations.

Similarly, in an aspect, data description and delivery service 102 canbe provided in a public or private environment/network. For example, aprivate entity may employ a data description and delivery service whichfacilitates discovery of data and shared perceptions of data associatedwith the entity. According to this aspect, the entity may recognizevarious vocabularies and/or global terms which are affiliated with theentity and/or defined by the entity. In another aspect, data descriptionand delivery service can be employed by a variety of clients 112 over apublic network, such as the world wide web.

Annotation component 106 is configured to receive an annotation of datafor a data source. Interface component 108 is configured to provideclients 112 access to the data store 104. In an aspect, interfacecomponent 108 enables a client 108 to query the data store 104 based onthe information held therein. In another aspect, interface component 108is further configured to present query results in the form of acatalogue 110. The query result can include a URL link to specific dataof a data source as well as access to additional metadata embodied inthe annotations applied to the data.

In an aspect, data store 104 holds three categories of information. Thefirst category includes the location of data. There are many possiblesources of data. Applications collect and maintain information indatabases, organizations store data in the cloud, and many firms make abusiness out of selling data. In an aspect, data is located at an entityor service that produces the data. In another aspect, data is located atan entity or service that publishes the data. As used herein thelocation of data includes the data source 114. Data sources areaccessible by a computing network such as the world wide web, theinternet, or intranet. In general, data sources are identified by auniform resource identifier (URI) that includes a specific URL anduniform resource name (URN). Data sources 114 are discussed in greaterdepth supra.

The second category of information includes held in data store 104includes definitions of vocabularies and the global terms included inthose vocabularies. The definition of a global term encompasses adescription the meaning of the global term. In an aspect, the definitioncan serve to identify the data. In another aspect, the definition can bedescriptive and indicate how the global term is to be applied to dataand the resulting output of the data when associated with the globalterm. Definitions can further include associations between vocabulariesand global terms. For example, a vocabulary can employ a distinctnomenclature comprising of particular set of global terms. Thedefinition of a vocabulary can include all of the global terms of thatvocabulary and the also the definitions of those global termsthemselves. Furthermore, the definition of a global term can includerich information regarding associations between the global term andother global terms and data sources. For example, a definition of aglobal term can include child components of the global term, synonyms ofthe global term, and related global terms. In another aspect, asdescribed infra, the definition of a global term can include parametersof use in the form of filters.

The third category of information includes annotations. As noted supra,an annotation includes the assignment of a global term to data of a datasource. In an aspect, annotations can be thought of as the mapping ofvocabularies and global terms to data sources. When a vocabulary isapplied to a data source, one or more global terms of that vocabulary isassigned to data of the data source. The assignment of a global term todata indicates how a global term is used for a data source. In anaspect, when a data source is annotated, a file or document is generatedthat includes metadata outlining how a global term is applied to thedata. In addition, the metadata can include additional rich informationincluding definitions of global terms and associations between otherglobal terms.

By providing annotations in data store 104, those annotations can bemade centrally available through the data description and deliveryservice 102. Accordingly, whenever another client works with aparticular data source, the client can choose to employ any previousannotations of the data. For example, data from data source X could bemarked or annotated with the global term “company name.” In an aspect,when a client works with data from data source X, the concept or ideaimparted by the global term “company name” on the data can be offered tothe client. For example, the data assigned to the global term “companyname” could be presented in an underlined fashion. In another aspect,data from data sources Y and Z could be annotated with the global term“company name.” Regardless of the meaning the global term “company name”imparts on a client application which interprets the global term“company name,” it is possible that advantages may lie in combiningsources X, Y and Z based on the global term “company name.” Thus becausedata sources X, Y, and Z have been annotated with the global term“company name,” a client can discover the relationship between the threesources.

Annotation component 106 is configured to receive as input, annotationsof data for a data source. In an aspect, annotation component receivesannotations of data for a data source in the form of a file document inresponse to data being annotated. In another aspect, annotationcomponent 106 generates an annotation file for data source in responseto data being annotated by a user or client application. The annotationsor annotation files can then be stored in data store 104. The annotationfile can include any metadata associated with the data including theassignment of vocabularies and global terms to data of a data source inthe form of metadata.

In an aspect, annotation component 106 receives an annotation for a datasource when the annotated data is published by the data source. Forexample, data sources 114 can include a centralized space for sharingdocuments over a public or private network. In an aspect, a data sourcecan include SharePoint site. SharePoint is a content management systemdeveloped by Microsoft™ Corporation. SharePoint™ allows groups to set upa centralized, password protected space for document sharing. Documentscan be stored, downloaded and edited, then uploaded for continuedsharing. Accordingly, in an aspect, clients can annotate data locallyand store the annotated data in a local database. The client couldfurther choose to publish the data to a public sharing site such as aSharePoint™ site. When the data is published, an annotation fileassociated with the data is also published and linked to the data sourceby the annotation component 106. In another aspect, annotation componentcan extract annotation files associated with data sources.

Data can become annotated in a variety of ways. In an aspect, the mannerin which a data source becomes annotated does not affect the futureapplication of those annotations by data description and deliveryservice 102. In particular, data description and delivery service 102serves a marketplace for the input of semantic information regardingdata in the form of annotations and the output of additional informationregarding the data based on those annotations. In an aspect, a user of aclient device or client application can manually annotate data. Forexample, an individual say Anna, could use data from a particular datasource when working in an Excel spreadsheet. In an aspect, Anna couldannotate the data with global terms as she is working with the data. Forexample, she could apply terms of a display vocabulary and indicate aparticular object is a “title,” and another object is a “summary.” In anaspect, client applications understand that the global term “title”indicates displaying the object in bold and the globe term “summary”indicates displaying the object in italic in a position following thetitle. In the above example, Anna has applied a display vocabularyincluding the terms “title” and “summary” to data of a particular datasource, say data source A. With the subject data description anddelivery service, this application of vocabularies to data source A canbe received by annotation component 106 and stored in data store 104.

In another aspect, a client device or application can employ aninference engine to facilitate applying a global term to data. Accordingto this aspect, the inference engine can be associated with the clientapplication or with the data description and delivery service 102. Asdiscussed infra, data description and delivery service can also includean analysis component and an inference component. These components cananalyze annotations and employ tracked data regarding user interactionwith the data description and delivery service 102 to make conclusionsregarding definitions of global terms, relationships between globalterms, and relationships between data sources. Further, analysis andinference components can apply these conclusions regarding global termsand data sources to new user contexts to facilitate annotating data. Forexample, a user of a client application may manually annotate data withthe “zipcode” global term. As a result, the client application canemploy data description and delivery service 102 to receive otherrelated global terms that are commonly employed with the “zipcode”global term and use those other related global terms to annotate thedata. For example, the client application can automatically annotate thedata or the client application can suggest annotations for the data.

According to the embodiment above, the client application can let thedata description and delivery service do the work and use datadescription and delivery service 102 to extract conclusions regardingglobal terms and/or suggested annotations of data. In another aspect,client application can track user interaction with the software and canemploy analysis and inference components to facilitate suggested orautomatic annotations of data with based on assumed meaning of the data,relationships between a global term employed and other global terms.Further, a client application may make gesture based inferencesregarding a possible annotation of data and automatically annotate thedata based on the inferences.

The above example with Anna demonstrates how data of a data source canbecome annotated with an annotation of one kind. However, data of a datasource can receive a plurality of annotations which may vary in kind.Because global terms impart different meanings, assigning differentglobal terms to data or different combinations of global terms to datacan cause a client application to render data in different forms anddimensions of complexity and richness. As a result, when data isannotated it is modeled through the association of global terms to thedata. As noted above, related global terms express some idea or conceptabout the underlying data. As used herein, the idea or concept expressedby annotated data is referred to as a perspective on data.

A perspectives on data can embody the manner in which annotations on thedata influence the way it is consumed and result from the manner of theannotations on the data for a data source. For example, the manner inwhich data is annotated can influence the way data is viewed, thepossible values that is individual components can take during editing(i.e. range checking), whether the application allows data sets to becombined and over which attributes, etc. A data model is dependent onthe annotations on data the client functionality. For example, a datamodel can include the manner in which data is viewed on a client devicewhich is dependent on the application of the data by the client deviceand the recognition of global terms in an annotation which facilitatethe application.

In an aspect, multiple views on data may exist for data of a single datasource. For example, data of data source may be annotated by differentclients or by the same client on different occasions. Accordingly, dataof a data source can have multiple annotations with assignments ofdifferent global terms. For example, a client application may annotatedata of a data source with the term “calendar,” and another clientapplication may annotate the data of the data source with the terms“agenda” and “room assignment.” In another aspect, data of a data sourcecan have a plurality of annotations that include application of the sameglobal terms, however the configuration in which the global terms areapplied to the data of the data source can vary. For example, a firstuser or client may assign global terms “one,” “two,” and “three,” todata objects “one,” “two,” and “three” respectively. In another aspect,a second user or client may assign global terms “one,” “two,” and“three,” to data objects “twelve,” “four,” and “thirty two,”respectively.

In addition, as noted infra, vocabularies can range in complexity fromsimple to complex. Accordingly, perspectives on data can range incomplexity from simple to complex. A vocabulary can include a singleglobal term or multiple global terms. A simple vocabulary might tell aclient which property to use as an entity's title when displaying it ina form, whereas a more complex vocabulary might tell someone how toconvert an OData person entity into a vCard entry. Further, the degreeto which data is annotated can range from low to high. For example, datacan be annotated with a single global term which conveys a simplemeaning regarding the display or use of the data. However, the data canalso be annotated with a plurality of global terms and vocabularies,each of which convey additional layers of meaning regarding theunderlying data. Therefore, because data of a data source can beannotated in a variety of manners, data of a data source can adoptmultiple perspectives on data.

In view of the above, according to another embodiment, annotationcomponent 106 is configured to capture multiple annotations for a singledata source or a particular set of data and store the annotations indata store 104. In other words, annotation component 106 is configuredto compile a plurality of annotations for a single data source.According to this aspect, each time data is annotated by one or multipleclients, the data is enriched with metadata. Regardless of the manner inwhich data becomes annotated, the annotations are captured by annotationcomponent 106. In this respect, data and/or data sources providing thedata can grow in richness of information associated with that dataovertime. As a result, impressions, thoughts, applications, andmodifications of data and data sources by multiple users can be capturedand shared. In other words, multiple users can mark up data, makemodifications to data, use data in different ways, view data indifferent ways, make conclusions about data, combine data from varioussources etc. As a result, multiple users can provide perspectives ondata and those perspectives can be captured through annotations on thedata. Because those annotations are stored in data store 104, thoseannotations can be shared.

For example, in furtherance to the example above with Anna and datasource A, suppose another individual Bob utilized data source A and alsoapplied annotations to data. For example, suppose the data source whichAnna annotated with the global term “title” further includes a pluralityof company locations in the form of addresses. Then suppose Bob workswith the data source in Excel and uses the various company addresses. Hefurther annotates the company addresses with a global term for“address.” As a result, certain data of data source A is also annotatedwith the global term “address.” At this point, data source A has twoperspective consumption modes.

Then suppose another individual “Carla” chooses to employ data source A.In an aspect, Carla can view data source A according to both perspectiveconsumption modes. Depending on the client application Carla employs toconsume the data from data source A, the annotations applied will resultin a rendering of the data in a certain form. For example, if Carla isworking with Excel she may be able to view the data with titles alreadybolded and summaries attached and address information applied. Inanother aspect, she may employ an application which recognizes the“address” global term and automatically presents Carla with a map havingthe locations of the addresses charted out. Carla can then choose theperspective on data source A which best suits her needs for a particularclient application.

Previous systems that captured improvements to data did not include afederation of well-known service endpoints that could be used to captureand share information related to the improvement. As a result, changes,thoughts, and impressions on data could not efficiently be shared withother applications. However, the subject data description and deliveryservice 102 captures user interaction with data so that enrichments todata can be shared with future users of the data. Enrichments to datainclude any thoughts, conclusions, uses, interpretations, modifications,and associations to data. Enrichment of data is captured through theassociation of global terms to the data to which represent a thought,conclusion, use, interpretation, modification or association of thedata. Through employment of data description and delivery service 102 inconjunction with of use of the service by multiple clients 112, the manyclients 112 that can capture augmentation events of data and send themto the data store 104 in the form of annotations. Further, any of theclients 112 can openly access such augmentations via the service 102.

In summary, annotation component 106 can compile annotations of data fora particular data source regardless of the way in which data receivesthe annotations. Because those compiled annotations are stored in datastore 104, whenever another client works with a particular data source,the client can choose to employ any previous annotations of the data.The client could also choose not to employ any previous annotation ofthe data and simply use the data in raw form. As a result, the clientcan receive a different experience with the data depending on anyprevious annotations of the data. In this respect, data enrichment canbe shared via the application of vocabularies to data of a data source.

Interface component 108 is configured to serve as an application programinterface (API) that enables discovery of data based on annotations andprovide clients access to data comprising annotations. Accordingly,interface component 106 is configured to facilitate an enrichedexperience with and the consumption of data which has been annotated. Inaddition, inference component 108 is configured to find global termsthat can be employed to annotate data. For example, interface component108 can receive a global term and find related global terms. In anotherexample, interface component 108 can receive search terms or phrases andparse the definitions of global terms to return possible global termsthat correspond to the search terms or phrases. The possible terms canthen be employed by the client application to annotate data.

In an aspect, interface component 108 is configured to issue searchqueries against the contents of data store 104 based on any of the threetypes of data stored in data store 104. Therefore, interface component108 can receive a request for data sources which have a particularvocabulary or global term applied to it. For example, interfacecomponent 108 could receive a request for all the data sources whichhave the global term “movie” applied. In another example, interfacecomponent 108 could receive a request for all of the global terms thathave been applied to a particular data source. In another example,interface component 108 could receive a request for global terms whichare related to or commonly applied in an associated relationship with aspecific global term. Relationships between global terms includingsynonyms and related terms are provided in the second category of datain data store 104, definitions of global terms. Accordingly, interfacecomponent 108 can parse a definition of a global term to determineadditional information about the global term. (Definitions of globalterms are discussed in greater depth infra).

Interface component 108 is further configured to produce a query resultcomprising the requested information. In turn, client applications canselect or employ a data source identified and produced by the interfacecomponent 108. In particular, the data source can have annotationsassociated therewith to facilitate an enriched experience with the dataof the data source. In an aspect, an annotation file or document can beretrieved by the interface component 108 from data store 104 in responseto selection of a particular data source by a client. In another aspect,the annotations file or document associated with a data source canfacilitate a plurality of interpretations or uses of the data sourceembodied in various perspectives on the data. According to this aspect,interface component can provide multiple views of different perspectiveson data for a selected data source based on the annotations applied.

In an embodiment, the interface component 108 produces a query result inthe form of a catalogue 110 comprising a list of URL links to datasources comprising the requested information. In another aspect, thecatalogue 110 can include additional details regarding a data sourceand/or the data provided by the data source for a particular linkdepending on the annotations associated with the data source. In yetanother aspect, interface component 108 can further provide URL links tomultiple versions of perspectives on data for a data source depending onthe number of annotations for a data source. According to this aspect, adata source may have multiple annotation files associated therewith,including one for each perspective on data generated via the differentannotation files. Alternatively, a data source may have a singleannotation file that complies multiple annotations of the data sourceand thus the multiple associated perspectives on data.

Referring back to FIG. 1, in addition to data description and deliveryservice 102, system 100 includes clients 112 and data sources 114. Ingeneral, the term client 112 is used herein to refer to an entity,individual, or computer application, which utilizes annotated data orannotated data sources. The term data source 114 is used to refer to anentity, service, or application that provides data. The data can beannotated or not annotated, however the data is useful with the subjectdata description and delivery service upon annotation. Nevertheless, itshould be appreciated that a client 112 can generate sources ofannotated data. For example, a client can annotate data and publish itas a data source. In another example, a client can utilize annotateddata from a data source and further annotate the data in associationwith additional actions with the data, impressions upon the data, andmodifications to the data.

In an embodiment, clients 112 can include any application operating on acomputing device configured to consume data. In an aspect, a client caninclude a computing device employing an application that consumes data.The computing device can be associated with a user. For example, aclient device could include a personal computer (PC), a tablet PC, alaptop computer, a server computer, a phone, a smartphone, etc. Inanother aspect, the term client is can refer to an actual individualuser of the data via an application. In an embodiment, the applicationcan include any custom application that uses data such as applicationson mobile devices, business intelligence (BI) tools, media programs,etc. In another embodiment, application consume data exposed using theOData protocol. Applications that can consume data exposed using theOdata protocol can include but are not limited to: browsers, ODataExplorer, Microsoft™ Excel, VisualStudio, LinQPad, Sesame Browser,Client Libraries, OData Helper for WebMatrix, Tableau, Telerik RadGridfor ASP.NET Ajax, Telerik RadControls for Silverlight and WPF, TelerikReporting, Database .NET v3, Pebble Reports, and (Unofficial) SSISimport script. It should be appreciated that the above list of Odataclients is merely presented as an example of type of applications whichconsume data, specifically Odata. The subject disclosure howevercontemplate all applications which consume data regardless as to whetherthe data is exposed using the Odata protocol or not.

Data sources 114, include any possible source of data that can beaccessed via a network. There are many possible sources of data. Forexample, applications collect and maintain information in databases,organizations store data in the cloud, individual produce personal dataand store it locally, and many firms make a business out of sellingdata. In an aspect, a data source can includes numerous amounts ofdifferent types of data at a specific location. The specific location isgenerally identified by a URL. In an aspect, a data source includes aservice configured to expose data using the Odata protocol. It should beappreciated that any service, individual, program, website, etc. can beconfigured to expose data using the Odata protocol. Some examplesapplications that expose Odata data sources include but are not limitedto: SAP NetWeaver, GatewaySharePoint 2010, IBM Web Sphere, Microsoft SQLAzure, Microsoft Dynamics CRM 2011, GeoREST, Webnodes CMS, TelerikOpenAccess ORM, and tm2o—OData provider for Topic. Some examples of liveOdata data sources include, Facebook Insights, ebay, Netflix, twitpic,Wine.com, Nuget, Nerd Dinner, Windows Live, and Microsoft Pinpoint.

It should be appreciated that the above noted clients and data sourcesthat have conformed to the Odata protocol are merely provided asexamples of the wide range of possible clients and data sources that canbe employed by the subject data description and delivery service. Assuch, additional clients and data sources are contemplated as the Odataecosystem continues to grow.

Referring now to FIG. 2, depicted is a system 200 configured tofacilitate sharing of vocabularies to improve the consumption of datathat has been associated with a vocabulary or global term. Similar tosystem 100, system 200 includes data description and delivery service202, data store 204, annotation component 206, interface component 208,clients 222 and data sources 224. In an aspect, data store 204,annotation component 206, interface component 208, clients 222 and datasources 224 are analogous to data store 104, annotation component 106,interface component 108, clients 112 and data sources 114. In addition,data store 204, annotation component 206, and interface component 208are configured to include additional features as noted infra withrespect to the additional components of data description and deliveryservice 202. In particular, data description and delivery service 202further includes tracking component 210, rating component 212, analysiscomponent 214, inference component 216, definition component 218, andfilter component 220. Further, system 200 can include one or morecrawlers 226.

In an embodiment, tracking component 210 is configured to trackannotations of data sources and client interaction with data descriptionand delivery service 202. Any information monitored or tracked bytracking component 210 can be collected and stored in data store 204.With respect to annotations of data sources, in an aspect, trackingcomponent 210 is configured to monitor the number of times a data sourceis annotated and the various perspectives on data generated via theannotations. In another aspect, tracking component is configured totrack what global terms are applied to data of a data source, how oftenthey are applied, when they are applied and who or what entity appliesthem. In another aspect, tracking component 210 is configured to trackusage requirements and parameters associated with data annotations. Forexample, tracking component is configured to determine languages ofannotations and client application requirements that employ theannotations.

With respect to client interaction with data description and deliveryservice 202, tracking component 210 is configured to track the actualconsumption of annotated data from data sources. As described supra,interface component 108 (and likewise interface component 208) isconfigured to present a client a query result which can include URLlinks to annotated data sources. In an aspect, interface component 108or 208 can further provide URL links to multiple versions ofperspectives on data for a data source depending on the number ofannotations for a data source. In an aspect, tracking component 210 isconfigured to track the consumption of annotated data sources. Forexample, tracking component 210 can determine when an annotated datasource is selected and/or employed by a client. According to thisexample, tracking component may determine data source A is selected oncean hour, once a day, once a week, etc. In another aspect, trackingcomponent 210 can track the consumption of data according to the variousperspectives of the data. For example, a data source may have severalperspective consumption modes or multiple data sources may haveperspective consumption modes that relate to a similar type of data.Tracking component 210 can thus track client selection and consumptionof data according to various perspective consumption modes on the datafrom one or multiple data sources. Further, in yet another aspect,tracking component 208 can monitor the identities of clients that employdata sources and the annotations on the data which they employ (i.e.,the resulting perspective on the data that the client employs).

In another embodiment, with respect to client interaction with datadescription and delivery service 202, tracking component 210 isconfigured to track client or user patterns with respect to consumptionof data from data sources based on annotations and the definitionsprovided for the global terms of those annotations. For example, datadescription and delivery service 102 and 202 enables users to discoverdata sources employing related annotations as outlined in thedefinitions of global terms. For example, the definition of a globalterm can include synonymous global terms and/or global terms which arerelated to a global term (child components of the global term), such asother global terms grouped with the global term by a vocabulary.Therefore, according to an aspect, tracking component 210 is configuredto track selection and use of data sources based on like terms orrelated terms. For example, a user or client may discover two or moredata sources employing a common term or a related term and choose tojoin the two or more data sources at consumption. According to thisexample, tracking component 210 is configured to monitor when datasources are commonly employed together by a client application.

It should be appreciated that the above examples of information thattracking component 210 is configured to collect are merely indented topresent examples of some of the types of information tracking component210 may collect. It should be appreciated however that trackingcomponent 210 is configured to track any type of user interaction with adata source based at least in part on the association of data of thatdata source with one or more global terms. The information tracked bytracking component 210 can further be stored in data store 204 forfuture access and analysis.

In addition, system 200 can include one or more crawlers 226. In anembodiment, a crawler 226 traverses the network in which datadescription and delivery service operates to gather information toenhance the service 202. For example, a crawler 226 can browse the WorldWide Web in a methodical, automated manner or in an orderly fashion tofind data sources and load descriptions of those data stores into datastore 204. In another aspect, crawler 226 can capture contextualinformation associated with a data source or attributes of a data source(such as usage history) and load the information in data store 204. Onething missing in this drawing is the crawlers. In an aspect, crawlers226 can sit between the data sources 224 and an instance of the datadescription and delivery service.

Rating component 212, is configured to enable users of the datadescription and delivery service 202 to rate annotations of data sourcesbased on the perspectives on data flowing from those annotations. Forexample, as noted supra, a data source can include a plurality ofannotations and thus embody a plurality of perspective modes ofconsumption. Rating component 212 allows users to rate a particularperspective. In another aspect, rating component 212 is configured toenable users to rate individual global terms and vocabularies. Still inyet another aspect, rating component 212 is configured to enable usersto rate data sources based on the overall quality of the data and dataconsumption mode/perspective offerings. The ratings received by ratingcomponent 212 can further be stored in data store 204 for future accessand analysis.

Analysis component 214 is configured to analyze data in data storage 204in order to make conclusions about data, data sources, definitions ofglobal terms and annotations based on the information held in datastorage 204 which. These conclusions can be employed by the service toenhance the objective of the service. In another aspect, theseconclusions can be employed by business intelligence systems. Still inyet another aspect, the conclusions can be employed by clientapplications to facilitate annotating data. As noted supra, trackingcomponent is configured to track any type of user interaction with adata source based at least in part on the association of data of thatdata source with one or more global terms. The information tracked bytracking component 210 can further be stored in data store 204 foraccess and analysis by analysis component 214.

According to an embodiment, analysis component 214 is configured toemploy tracked data and annotations to determine the following: thedegree of popularity of a perspective on data, the quality of aperspective on data, the popularity of a global term, the array ofglobal terms applied to a data source, the reputation of a perspectiveson data, the clientele distribution of consumption of a data model, thereputation of a data source, the frequency and timing of consumption ofa perspectives on data, or the location of consumption of a perspectiveon data. For example, the degree of popularity of a perspectives on datacan be determined by one or more algorithms that accounts for any thenumber of times a perspectives on data is consumed. In another aspect,the quality of a perspectives on data can account for the number oftimes the perspectives on data is employed, the rating of theperspectives on data, and the clientele distribution of consumption ofthe perspectives on data. Further, analysis component 214 can employstatistical analysis to associate percentages with conclusions. Forexample, analysis component 214 can determine the percentages associatedwith global term or vocabulary usage against a data source as comparedto other global term or other vocabulary usage. In an aspect, analysiscomponent 214 can make the above determinations on a routine basis andstore the determinations in data store 204. In another aspect, analysiscomponent 214 is configured to perform the above determinations inresponse to query requests.

Interface component 208 can further employ conclusions regardingperspectives on data, global terms and data sources as parameters whenperforming queries against data store 204. As a result, clients andusers can request and receive rich business intelligence informationbased on any of the conclusions made by the analysis component. In anaspect, the data description and delivery service 202 can sell the abovenoted rich business intelligence information. In another embodiment,data description and delivery service 202 can employ conclusions made bythe analysis component 214 to optimize user data queries. For example,analysis interface component 208 can not only offer data sources whichinclude a select global term or terms, but also the ratings of the datasources, indication of the quality of those data sources and indicationso the degree of complexity of a particular perspective on dataaffiliated with those data sources.

In another embodiment, analysis component 214 is configured to employtracked data to identify information that can be employed in thedefinitions of global terms. As a result, analysis component cangenerate definitions of global terms. The definition can further includea profile of the global term including any conclusions about the use,meaning, or associations of the global term. For example, analysiscomponent 214 can examine annotations to analyze trends in theapplication of global terms against certain types of data. Similarly,analysis component can identify when data sources are merged based ondifferent user selected global terms to identify a relationship betweenthe terms. Analysis component can further make conclusions about thedata sources based on the merge. For example, a client application canchoose to merge data source A with data source B based on a commonannotation scheme or a related global term. In addition, any richinformation known regarding data source A can now be associated withdata source B. For example, suppose data source A had people data.Analysis component can further note that data source B relates to peopledata as well.

In addition, analysis component 214 is configured to analyze annotationsto identify patterns for uses of global terms and structure of theunderlying data. For example, analysis component 214 is configured learnassociations between global terms and the underlying data structure. Inanother aspect, analysis component 214 is configured to examine whichglobal terms which are employed together and how they are employedtogether, to determine relationships between the global terms. For,example, analysis component 214 can examine annotated data to extractrelationships between global terms. Further, analysis component 214 canemploy statistical analysis to associate percentages with conclusionsregarding global term relationships. For example, analysis component 214can learn that in 80% of the time the “zipcode” annotation is followedby a “map” annotation. Furthermore, analysis component can employtracked user interaction with the data description and delivery serviceto discern associations between global term and data sources.Accordingly, analysis component 214 can employ tracked user patterns andannotations to identify related global terms, similar global terms, andsynonymous global terms. Related global terms can include any globalterms which have a similar meaning, similar applications, harmoniousapplication or other relationship. Synonymous global terms can includeterms which mean the same thing and convey the same concept or idea. Itshould be appreciated that global terms in different languages may besynonymous.

Further, as discussed below, the definitions of global terms can includerequirements for consumption of data annotated with the global term. Forexample, client applications or devices may not support the consumptionmode exhibited by an annotation with a particular global term. Accordingto this aspect, analysis component 214 can use tracked informationindicating what client application or device generated an annotation toidentify appropriate restrictions for future consumption of the data. Inanother aspect, certain data consumption modes might be inappropriate infor use in certain contexts or environments. According to this aspect,analysis component 214 can use tracked data indicating location and/orcontext of consumption of a perspectives on data to determinerestrictions for applications of perspectives on data and or globalterms. Interface component 208 can further employ the conclusionsregarding restrictions for consumption when issuing query results to aclient. For example, a query result can red flag data sources andperspectives on data which may not be appropriate for a client or simplynot offer those data sources and perspectives on data to the client.

In yet another embodiment, analysis component 214 is configured toemploy conclusions regarding associations between global terms tofacilitate annotating data. According to this aspect, analysis component214 can facilitate suggested annotations of data sources and/orautomatic annotations of data sources. For example, it is possible thata data source is never annotated or minimally annotated. Analysiscomponent 214 is configured to interpret the structure of theun-annotated data and based on observed annotations of similarlystructured data, analysis component 214 is configured to fill in thegaps. For example, analysis component 214 can generate suggestedannotation of data sources or automatically annotate the data. Inanother aspect, a client application can annotate data with a firstglobal term. Once the data has a first global term applied, the datadescription and delivery service via the analysis component 214 canprofile the data term to determine additional possible global terms orvocabularies that the client may want to employ.

Inference component 216 is configured to assist analysis component 214in making conclusions regarding data, data sources, definitions ofglobal terms and annotations based on the information held in data store204. In an aspect, inference component 216 is configured to employ datain data store 204 to determine the following: the degree of popularityof a perspective on data, the quality of a perspective on data, thepopularity of a global term, the array of global terms applied to a datasource, the reputation of a perspective on data, the clienteledistribution of consumption of a perspective on data, the reputation ofa data source, the frequency and timing of consumption of a perspectiveon data, or the location of consumption of a perspective on data.Inference component can further assist analysis component 214 whendetermining possible annotations for data sources. For example,inference component can account for a variety of factors such as thetype of client application, the type of data, the location of theclient, the requirements of the client device, recent popular globalterms, and/or user preferences, in order to suggest appropriateannotations for data sources.

Inference component 216 employs explicitly and/or implicitly trainedclassifiers in connection with performing inference and/or probabilisticdeterminations and/or statistical-based determinations as in accordancewith one or more aspects of the disclosed subject matter as describedherein. For example, the inference component 216 can employ previousannotations of data sources and compare them with annotations of otherdata sources to automatically determine new annotations of data sources.In another aspect, inference component 216 can infer likely joinings ofdata sources based on patterns recognized in previous joinings. As aresult, data description and delivery service 202 can suggest possiblecombinations of data sources.

As used herein, the term “infer” or “inference” refers generally to theprocess of reasoning about, or inferring states of, the system,environment, user, and/or intent from a set of observations as capturedvia events and/or data. Captured data and events can include user data,device data, environment data, data from sensors, sensor data,application data, implicit data, explicit data, etc. Inference component216 can be employed to identify a specific context or action, or cangenerate a probability distribution over states of interest based on aconsideration of data and events, for example. For example, inferencecomponent 216 can infer a user or client context and tailor queryresults based on user/client context.

Inference can also refer to techniques employed for composinghigher-level events from a set of events and/or data. Such inferenceresults in the construction of new events or actions from a set ofobserved events and/or stored event data, whether the events arecorrelated in close temporal proximity, and whether the events and datacome from one or several event and data sources. Various classificationschemes and/or systems (e.g., support vector machines, neural networks,expert systems, Bayesian belief networks, fuzzy logic, and data fusionengines) can be employed in connection with performing automatic and/orinferred action in connection with the disclosed subject matter.

Definition component 218 is configured to employ conclusions anddeterminations made by analysis component to generate definitions forglobal terms. For example, in an aspect, the definition of a global termcan be pre-configured by data description and delivery service 101 or102. In another aspect, the definition of a global term can be providedin association with an annotation or annotation file. For example, theannotator, client or user can not only annotate data with global termsbut also provide definitions of those global terms. In another aspect asnoted infra, analysis component can infer the definition of a globalterm based on its association with the underlying data and patterns ofuse of the global terms. In addition, analysis component can inferrequirements of consumption of annotated data that can also be includedin a definition of a global term. Definition component 218 thereforeextracts possible definitions of global terms from conclusions made byanalysis component 214 and creates definitions of global terms in datastore 204 or adds to existing definitions of global terms in data store204.

Filter component 220, is configured to apply the aspects of definitionsof global terms relating to requirements for consumption of annotateddata in association with interface component 208 to render searchqueries against data store 204 that are tailored to a client. Inparticular, filter component is configured to filter query results basedon client consumption requirements. For example, in an aspect filtercomponent 220 can determine appropriate layouts for presentation of dataon different client devices or client applications, and determinelocation specific applications of vocabularies. For example, dependingon user device capabilities, certain display visualizations of data maynot be supported although represented by an annotation. In anotheraspect, certain perspectives on data might be inappropriate in for usein certain contexts or environments. According to this aspect, analysiscomponent 214 and/or inference component can use tracked data indicatinglocation and/or context of consumption of data according to aperspective to determine restrictions for applications of data modelsand or global terms and associate those restrictions with thedefinitions of respective global terms. In another aspect, a client canimplicitly or explicitly provide data description and delivery service202 with its consumption requirements. For example, a client mayindicate the type of program that which in turn can reflect to theservice 202 its consumption requirements. In another example, the clientmay provide consumption requirements in the form of key terms a part ofa query request. Interface component 208 can further employ theconclusions regarding restrictions for consumption when issuing queryresults to a client. For example, a query result can red flag datasources and data models which may not be appropriate for a client orsimply not offer those data sources and perspectives on data to theclient.

Moving now to FIG. 3, illustrated is a flow diagram 300 exemplifying anapplication of data description and delivery service 102 or datadescription and delivery service 202. At 302, a user works with anoriginal data source via a client application such as Excel. Theoriginal data source includes merely a comma separated list of objects.The user further desires to publish the data source as an Odata feed. At304, the user clicks on row in the data source where column names arelocated, and selects an action called “promote headers” which indicatesthe column names should be employed as the column names for the Odatafeed. As a result, at 306 the client application can automaticallyannotate the column names with a global term “headers.” At 308, theclient application can further request from data description anddelivery service, other global terms which are related to the globalterm “headers.” At 310, the data description and delivery service 102 or202 can render a query result with related global terms. For example, aquery result could include global terms such as “company name,” “contactname,” or “address.” At 312, the user can select a proposed global termand apply it to his data source. For example, the user can select theterm “company name” to name a column. Then at 314, following selectionand application of the suggested term “company name,” the datadescription and delivery service can further inform the user ofadditional data sources that are annotated with the selected globalterm, “company name.” Then at 316, the data description and deliveryservice can profile the additional data sources and the user's datasource to determine if there is significant overlap in data content tosee determine if a possible join between any of the sources would affordthe user additional value.

Looking at FIG. 4, illustrated is a flow diagram 400 exemplifyinganother application of data description and delivery service 102 or datadescription and delivery service 202. According to the exampleapplication of data description and delivery service 102 or 202 in FIG.4, a set of global names ABC (i.e. a vocabulary) exists and are definedin the data description and delivery service that are employed to aid inthe visualization of data. For example, one of the global terms canindicate what is a “title” so a client application knows what to put inbold, or another one of the global terms can indicate what is a“summary” so the client application knows where and how to display thesummary. In addition, in the service there are annotations of the globalterms included in the set.

With the above foundation, at 402, a user can employ a clientapplication, for example, Excel. At 404, the user can further employ anapplication associated with Excel that enables data browsing. In anaspect, the user can select a data source, say data source 253, that isformatted as an Odata feed. At 406, the user can ask the datadescription and delivery service whether data source 253 has beenannotated, in particular, the user can ask the data description anddelivery service whether the ABC vocabulary has been applied to datasource 253. At 408, the data description and delivery service 102 or 202can answer the user by providing a query result with links to variousperspectives on data for data source X that have been modeled indifferent ways with the ABC vocabulary. At 410, the user can sample viewthe various data models and select one for consumption. In anotheraspect, at 412, the user can select a data source and decide not toemploy the perspective on the data associated therewith, and in thealternative annotate the data himself thus creating a new perspective onthe data which he can then publish to the data description and deliveryservice.

FIGS. 5-8 illustrate various methodologies in accordance with thedisclosed subject matter. While, for purposes of simplicity ofexplanation, the methodologies are shown and described as a series ofacts, it is to be understood and appreciated that the disclosed subjectmatter is not limited by the order of acts, as some acts may occur indifferent orders and/or concurrently with other acts from that shown anddescribed herein. For example, those skilled in the art will understandand appreciate that a methodology can alternatively be represented as aseries of interrelated states or events, such as in a state diagram.Moreover, not all illustrated acts may be required to implement amethodology in accordance with the disclosed subject matter.Additionally, it is to be further appreciated that the methodologiesdisclosed hereinafter and throughout this disclosure are capable ofbeing stored on an article of manufacture to facilitate transporting andtransferring such methodologies to computers.

Referring now to FIG. 5, exemplary method 500 for storing an annotationof a data source and rendering the annotation upon a request for datafrom the data source is depicted. Generally, at reference numeral 502, afirst annotation of data is received for a data source, wherein thefirst annotation comprises an association of a first global term withthe data for the data source. At reference numeral 504, the firstannotation is stored. At reference numeral 506, the data is renderedbased on the first annotation in response to a request for the data. Atthis time the method can stop or continue with point A as described withreference to FIG. 6.

Turning now to FIG. 6, depicted is an exemplary method 600 for providingadditional features or aspects in connection with storing annotations ofa data source and rendering the annotation upon a request for data fromthe data source. In particular, method 600 depicts the ability to thesubject data description and delivery service to collect multipleannotations of a data for a single data source, thus enabling communityenrichment of data. FIG. 6 picks up the method of FIG. 5 from point A.Generally at reference numeral 602, a second annotation of the data forthe data source is received, wherein the second annotation comprises atleast one of a different association of the first global term with thedata for the data source or an association of a second global term withthe data for the data source. At reference numeral 604, the secondannotation is stored. Then, at reference numeral 60, the data isrendered based on the first or second annotation in response to arequest for the data.

With reference now to FIG. 7, exemplary method 700 for issuing queriesagainst data description and delivery service is presented. In general,at reference numeral 702, annotations are received, wherein theannotations include associations of global terms with data of datasources. Then at 704, information is stored, the information includingdescriptions of the data sources, definitions of the global terms, andthe annotations. Then a subset of the information is determined based onthe annotations. For example, at 706, the subset of information isdetermined by receiving a global term and determining data sources thathave the global term associated with the data thereof. In anotheraspect, at 708, the subset of information is determined by receiving adata source and determining the global terms associated therewith.Further at 710, the subset of information is determined by receiving aglobal term and determining data sources that have a related global termassociated with the data thereof based on a definition of the globalterm.

Referring now to FIG. 8, presented is a process of tracking use of adata description and delivery service for business intelligenceanalysis. Generally, at reference numeral 802, annotations are received,wherein the annotations include associations of global terms with dataof data sources. At reference numeral 804, information is stored, theinformation including descriptions of the data sources, definitions ofthe global terms, and the annotations. At reference numeral 806, subsetsof the information are generated in response to requests based on theannotations. For, example, a user can issue queries against the datastore to find appropriate global terms or to find data sources thatinclude particular global terms or vocabularies. The user may use globalterms as key words in query searches or definitions of global terms. Atreference numeral 808, a selection of at least one of a data source, anannotation, or a global terms included in the subsets is received. Thenat 810, the selection is tracked. For example, data description anddelivery service via tracking component 210, can track queries requestsand selections so that relationships between global terms and datasources can be learned.

Exemplary Networked and Distributed Environments

One of ordinary skill in the art can appreciate that the variousembodiments of dynamic composition described herein can be implementedin connection with any computer or other client or server device, whichcan be deployed as part of a computer network or in a distributedcomputing environment, and can be connected to any kind of data storewhere media may be found. In this regard, the various embodimentsdescribed herein can be implemented in any computer system orenvironment having any number of memory or storage units, and any numberof applications and processes occurring across any number of storageunits. This includes, but is not limited to, an environment with servercomputers and client computers deployed in a network environment or adistributed computing environment, having remote or local storage.

Distributed computing provides sharing of computer resources andservices by communicative exchange among computing devices and systems.These resources and services include the exchange of information, cachestorage and disk storage for objects, such as files. These resources andservices also include the sharing of processing power across multipleprocessing units for load balancing, expansion of resources,specialization of processing, and the like. Distributed computing takesadvantage of network connectivity, allowing clients to leverage theircollective power to benefit the entire enterprise. In this regard, avariety of devices may have applications, objects or resources that mayparticipate in the smooth streaming mechanisms as described for variousembodiments of the subject disclosure.

FIG. 9 provides a schematic diagram of an exemplary networked ordistributed computing environment. The distributed computing environmentcomprises computing objects 910, 912, etc. and computing objects ordevices 920, 922, 924, 926, 928, etc., which may include programs,methods, data stores, programmable logic, etc., as represented byapplications 930, 932, 934, 936, 938. It can be appreciated thatcomputing objects 910, 912, etc. and computing objects or devices 920,922, 924, 926, 928, etc. may comprise different devices, such as PDAs,audio/video devices, mobile phones, MP3 players, personal computers,laptops, etc.

Each computing object 910, 912, etc. and computing objects or devices920, 922, 924, 926, 928, etc. can communicate with one or more othercomputing objects 910, 912, etc. and computing objects or devices 920,922, 924, 926, 928, etc. by way of the communications network 940,either directly or indirectly. Even though illustrated as a singleelement in FIG. 9, network 940 may comprise other computing objects andcomputing devices that provide services to the system of FIG. 9, and/ormay represent multiple interconnected networks, which are not shown.Each computing object 910, 912, etc. or computing objects or devices920, 922, 924, 926, 928, etc. can also contain an application, such asapplications 930, 932, 934, 936, 938, that might make use of an API, orother object, software, firmware and/or hardware, suitable forcommunication with or implementation of the smooth streaming provided inaccordance with various embodiments of the subject disclosure.

There are a variety of systems, components, and network configurationsthat support distributed computing environments. For example, computingsystems can be connected together by wired or wireless systems, by localnetworks or widely distributed networks. Currently, many networks arecoupled to the Internet, which provides an infrastructure for widelydistributed computing and encompasses many different networks, thoughany network infrastructure can be used for exemplary communications madeincident to the dynamic composition systems as described in variousembodiments.

Thus, a host of network topologies and network infrastructures, such asclient/server, peer-to-peer, or hybrid architectures, can be utilized.The “client” is a member of a class or group that uses the services ofanother class or group to which it is not related. A client can be aprocess, i.e., roughly a set of instructions or tasks, that requests aservice provided by another program or process. The client processutilizes the requested service without having to “know” any workingdetails about the other program or the service itself.

In a client/server architecture, particularly a networked system, aclient is usually a computer that accesses shared network resourcesprovided by another computer, e.g., a server. In the illustration ofFIG. 9, as a non-limiting example, computing objects or devices 920,922, 924, 926, 928, etc. can be thought of as clients and computingobjects 910, 912, etc. can be thought of as servers where computingobjects 910, 912, etc. provide data services, such as receiving datafrom client computing objects or devices 920, 922, 924, 926, 928, etc.,storing of data, processing of data, transmitting data to clientcomputing objects or devices 920, 922, 924, 926, 928, etc., although anycomputer can be considered a client, a server, or both, depending on thecircumstances. Any of these computing devices may be processing data, orrequesting transaction services or tasks that may implicate thetechniques for dynamic composition systems as described herein for oneor more embodiments.

A server is typically a remote computer system accessible over a remoteor local network, such as the Internet or wireless networkinfrastructures. The client process may be active in a first computersystem, and the server process may be active in a second computersystem, communicating with one another over a communications medium,thus providing distributed functionality and allowing multiple clientsto take advantage of the information-gathering capabilities of theserver. Any software objects utilized pursuant to the techniques forperforming read set validation or phantom checking can be providedstandalone, or distributed across multiple computing devices or objects.

In a network environment in which the communications network/bus 940 isthe Internet, for example, the computing objects 910, 912, etc. can beWeb servers with which the client computing objects or devices 920, 922,924, 926, 928, etc. communicate via any of a number of known protocols,such as the hypertext transfer protocol (HTTP). Servers 910, 912, etc.may also serve as client computing objects or devices 920, 922, 924,926, 928, etc., as may be characteristic of a distributed computingenvironment.

Exemplary Computing Device

As mentioned, advantageously, the techniques described herein can beapplied to any device where it is desirable to perform dynamiccomposition. It is to be understood, therefore, that handheld, portableand other computing devices and computing objects of all kinds arecontemplated for use in connection with the various embodiments, i.e.,anywhere that a device may wish to read or write transactions from or toa data store. Accordingly, the below general purpose remote computerdescribed below in FIG. 10 is but one example of a computing device.Additionally, a database server can include one or more aspects of thebelow general purpose computer, such as a media server or consumingdevice for the dynamic composition techniques, or other media managementserver components.

Although not required, embodiments can partly be implemented via anoperating system, for use by a developer of services for a device orobject, and/or included within application software that operates toperform one or more functional aspects of the various embodimentsdescribed herein. Software may be described in the general context ofcomputer executable instructions, such as program modules, beingexecuted by one or more computers, such as client workstations, serversor other devices. Those skilled in the art will appreciate that computersystems have a variety of configurations and protocols that can be usedto communicate data, and thus, no particular configuration or protocolis to be considered limiting.

FIG. 10 thus illustrates an example of a suitable computing systemenvironment 1000 in which one or aspects of the embodiments describedherein can be implemented, although as made clear above, the computingsystem environment 1000 is only one example of a suitable computingenvironment and is not intended to suggest any limitation as to scope ofuse or functionality. Neither is the computing environment 1000 beinterpreted as having any dependency or requirement relating to any oneor combination of components illustrated in the exemplary operatingenvironment 1000.

With reference to FIG. 10, an exemplary remote device for implementingone or more embodiments includes a general purpose computing device inthe form of a computer 1010. Components of computer 1010 may include,but are not limited to, a processing unit 1020, a system memory 1030,and a system bus 1022 that couples various system components includingthe system memory to the processing unit 1020.

Computer 1010 typically includes a variety of computer readable mediaand can be any available media that can be accessed by computer 1010.The system memory 1030 may include computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) and/orrandom access memory (RAM). By way of example, and not limitation,memory 1030 may also include an operating system, application programs,other program modules, and program data.

A user can enter commands and information into the computer 1010 throughinput devices 1040. A monitor or other type of display device is alsoconnected to the system bus 1022 via an interface, such as outputinterface 1050. In addition to a monitor, computers can also includeother peripheral output devices such as speakers and a printer, whichmay be connected through output interface 1050.

The computer 1010 may operate in a networked or distributed environmentusing logical connections to one or more other remote computers, such asremote computer 1070. The remote computer 1070 may be a personalcomputer, a server, a router, a network PC, a peer device or othercommon network node, or any other remote media consumption ortransmission device, and may include any or all of the elementsdescribed above relative to the computer 1010. The logical connectionsdepicted in FIG. 10 include a network 1072, such local area network(LAN) or a wide area network (WAN), but may also include othernetworks/buses. Such networking environments are commonplace in homes,offices, enterprise-wide computer networks, intranets and the Internet.

As mentioned above, while exemplary embodiments have been described inconnection with various computing devices and network architectures, theunderlying concepts may be applied to any network system and anycomputing device or system in which it is desirable to publish orconsume media in a flexible way.

Also, there are multiple ways to implement the same or similarfunctionality, e.g., an appropriate API, tool kit, driver code,operating system, control, standalone or downloadable software object,etc. which enables applications and services to take advantage of thedynamic composition techniques. Thus, embodiments herein arecontemplated from the standpoint of an API (or other software object),as well as from a software or hardware object that implements one ormore aspects of the smooth streaming described herein. Thus, variousembodiments described herein can have aspects that are wholly inhardware, partly in hardware and partly in software, as well as insoftware.

The word “exemplary” is used herein to mean serving as an example,instance, or illustration. For the avoidance of doubt, the subjectmatter disclosed herein is not limited by such examples. In addition,any aspect or design described herein as “exemplary” is not necessarilyto be construed as preferred or advantageous over other aspects ordesigns, nor is it meant to preclude equivalent exemplary structures andtechniques known to those of ordinary skill in the art. Furthermore, tothe extent that the terms “includes,” “has,” “contains,” and othersimilar words are used in either the detailed description or the claims,for the avoidance of doubt, such terms are intended to be inclusive in amanner similar to the term “comprising” as an open transition wordwithout precluding any additional or other elements.

Computing devices typically include a variety of media, which caninclude computer-readable storage media and/or communications media, inwhich these two terms are used herein differently from one another asfollows. Computer-readable storage media can be any available storagemedia that can be accessed by the computer, is typically of anon-transitory nature, and can include both volatile and nonvolatilemedia, removable and non-removable media. By way of example, and notlimitation, computer-readable storage media can be implemented inconnection with any method or technology for storage of information suchas computer-readable instructions, program modules, structured data, orunstructured data. Computer-readable storage media can include, but arenot limited to, RAM, ROM, EEPROM, flash memory or other memorytechnology, CD-ROM, digital versatile disk (DVD) or other optical diskstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or other tangible and/or non-transitorymedia which can be used to store desired information. Computer-readablestorage media can be accessed by one or more local or remote computingdevices, e.g., via access requests, queries or other data retrievalprotocols, for a variety of operations with respect to the informationstored by the medium.

On the other hand, communications media typically embodycomputer-readable instructions, data structures, program modules orother structured or unstructured data in a data signal such as amodulated data signal, e.g., a carrier wave or other transportmechanism, and includes any information delivery or transport media. Theterm “modulated data signal” or signals refers to a signal that has oneor more of its characteristics set or changed in such a manner as toencode information in one or more signals. By way of example, and notlimitation, communication media include wired media, such as a wirednetwork or direct-wired connection, and wireless media such as acoustic,RF, infrared and other wireless media.

As mentioned, the various techniques described herein may be implementedin connection with hardware or software or, where appropriate, with acombination of both. As used herein, the terms “component,” “system” andthe like are likewise intended to refer to a computer-related entity,either hardware, a combination of hardware and software, software, orsoftware in execution. For example, a component may be, but is notlimited to being, a process running on a processor, a processor, anobject, an executable, a thread of execution, a program, and/or acomputer. By way of illustration, both an application running oncomputer and the computer can be a component. One or more components mayreside within a process and/or thread of execution and a component maybe localized on one computer and/or distributed between two or morecomputers.

The aforementioned systems have been described with respect tointeraction between several components. It can be appreciated that suchsystems and components can include those components or specifiedsub-components, some of the specified components or sub-components,and/or additional components, and according to various permutations andcombinations of the foregoing. Sub-components can also be implemented ascomponents communicatively coupled to other components rather thanincluded within parent components (hierarchical). Additionally, it is tobe noted that one or more components may be combined into a singlecomponent providing aggregate functionality or divided into severalseparate sub-components, and that any one or more middle layers, such asa management layer, may be provided to communicatively couple to suchsub-components in order to provide integrated functionality. Anycomponents described herein may also interact with one or more othercomponents not specifically described herein but generally known bythose of skill in the art.

In view of the exemplary systems described supra, methodologies that maybe implemented in accordance with the described subject matter will bebetter appreciated with reference to the flowcharts of the variousfigures. While for purposes of simplicity of explanation, themethodologies are shown and described as a series of blocks, it is to beunderstood and appreciated that the claimed subject matter is notlimited by the order of the blocks, as some blocks may occur indifferent orders and/or concurrently with other blocks from what isdepicted and described herein. Where non-sequential, or branched, flowis illustrated via flowchart, it can be appreciated that various otherbranches, flow paths, and orders of the blocks, may be implemented whichachieve the same or a similar result. Moreover, not all illustratedblocks may be required to implement the methodologies describedhereinafter.

In addition to the various embodiments described herein, it is to beunderstood that other similar embodiments can be used or modificationsand additions can be made to the described embodiment(s) for performingthe same or equivalent function of the corresponding embodiment(s)without deviating there from. Still further, multiple processing chipsor multiple devices can share the performance of one or more functionsdescribed herein, and similarly, storage can be effected across aplurality of devices. Accordingly, the invention is not to be limited toany single embodiment, but rather can be construed in breadth, spiritand scope in accordance with the appended claims.

What is claimed is:
 1. A method comprising: receiving annotations,wherein the annotations include associations of global terms with dataof data sources; storing information, the information including,descriptions of the data sources, definitions of the global terms, andthe annotations; and determining a subset of the information based onthe annotations.
 2. The method of claim 1, wherein the determining thesubset of information includes receiving a global term and determiningdata sources that have the global term associated with the data thereof.3. The method of claim 1, wherein the determining the subset ofinformation includes receiving a data source and determining the globalterms associated therewith.
 4. The method of claim 1, wherein thedetermining the subset of information includes receiving a global termand determining data sources that have a related global term associatedwith the data thereof based on a definition of the global term.
 5. Themethod of claim 1, wherein the global terms convey properties of thedata for respective data sources.
 6. The method of claim 1, wherein theglobal terms convey manners of use of the data respective data sourcesby a client.
 7. The method of claim 1, wherein the global terms conveymanners in which to present the data of the respective data sources by aclient.
 8. The method of claim 1, wherein the global terms conveymeanings about the data for the respective data sources, wherein themeanings are recognized by at least two different clients.
 9. A methodcomprising: receiving annotations, wherein the annotations includeassociations of global terms with data of data sources; storinginformation, the information including, descriptions of the datasources, definitions of the global terms, and the annotations;generating subsets of the information in response to requests based onthe annotations; receiving selections of at least one of a data source,an annotation, or a global term included in the subsets; and trackingthe selections.
 10. The method of claim 9, further comprising:determining a relationship between respective requests and respectiveselections; and determining the subsets of the information based on therelationship.
 11. The method of claim 9, further comprising: determiningpreferences for at least one of the data sources, the annotations, orthe global terms in response to the tracking.
 12. The method of claim 9,wherein tracking the selections includes tracking the types of clientsaffiliated with the selections.
 13. The method of claim 9, furthercomprising: receiving ratings of at least one of the data sources or theannotations; and employing the ratings to determine at least one of: apopularity of respective data sources, a popularity of respectiveannotations, a quality of respective data sources, or a quality ofrespective annotations.